Adaptive Hausdorff distances and dynamic clustering of symbolic interval data
نویسندگان
چکیده
This paper presents a partitional dynamic clustering method for interval data based on adaptive Hausdorff distances. Dynamic clustering algorithms are iterative two-step relocation algorithms involving the construction of the clusters at each iteration and the identification of a suitable representation or prototype (means, axes, probability laws, groups of elements, etc.) for each cluster by locally optimizing an adequacy criterion that measures the fitting between the clusters and their corresponding representatives. In this paper, each pattern is represented by a vector of intervals. Adaptive Hausdorff distances are the measures used to compare two interval vectors. Adaptive distances at each iteration change for each cluster according to its intra-class structure. The advantage of these adaptive distances is that the clustering algorithm is able to recognize clusters of different shapes and sizes. To evaluate this method, experiments with real and synthetic interval data sets were performed. The evaluation is based on an external cluster validity index (corrected Rand index) in a framework of a Monte Carlo experiment with 100 replications. These experiments showed the usefulness of the proposed method. 2005 Elsevier B.V. All rights reserved.
منابع مشابه
Adaptative Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data
This paper presents a partitional dynamic clustering method for interval data based on adaptive Hausdorff distances. Dynamic clustering algorithms are iterative two-step relocation algorithms involving the construction of the clusters at each iteration and the identification of a suitable representation or prototype (means, axes, probability laws, groups of elements, etc.) for each cluster by l...
متن کاملHausdorff Distance Measure Based Interval Fuzzy Possibilistic C-Means Clustering Algorithm
Clustering algorithms have been widely used artificial intelligence, data mining and machine learning, etc. It is unsupervised classification and is divided into groups according to data sets. That is, the data sets of similarity partition belong to the same group; otherwise data sets divide other groups in the clustering algorithms. In general, to analysis interval data needs Type II fuzzy log...
متن کاملClustering Interval-valued Data Using an Overlapped Interval Divergence
As a common problem in data clustering applications, how to identify a suitable proximity measure between data instances is still an open problem. Especially when interval-valued data is becoming more and more popular, it is expected to have a suitable distance for intervals. Existing distance measures only consider the lower and upper bounds of intervals, but overlook the overlapped area betwe...
متن کاملFuzzy c-means clustering methods for symbolic interval data
This paper presents adaptive and non-adaptive fuzzy c-means clustering methods for partitioning symbolic interval data. The proposed methods furnish a fuzzy partition and prototype for each cluster by optimizing an adequacy criterion based on suitable squared Euclidean distances between vectors of intervals. Moreover, various cluster interpretation tools are introduced. Experiments with real an...
متن کاملMultidimensional Interval-Data: Metrics and Factorial Analysis
Statistical units described by interval-valued variables represent a special case of Symbolic Objects, where all descriptors are quantitative variables. In this context, the paper presents two different metrics in R for interval-valued data that are based on the definition of the Hausdorff distance in R. Hausdorff distance in R (for any p ≥ 1) is a L∞ norm between pairs of closed sets. However,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 27 شماره
صفحات -
تاریخ انتشار 2006